1. Data

Load the student scores for the test - here we load the ETH Zurich test data, downloaded from https://pontifex.ethz.ch/s21t5/rate/

The scores are:

  • 0 if the student gave an incorrect response
  • 1 if the student gave a correct response
  • 2 if the student chose the “I don’t know” answer option
  • NA if there was no response recorded
test_scores
## # A tibble: 9,671 x 44
##    test_version year  class      A1_B1 A2_B0 A3_B2 A4_B3 A5_B4 A6_B5 A7_B6 A8_B7
##    <chr>        <chr> <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 pre          2017  s21t-000-~     1     0     1     1     1     0     1     0
##  2 pre          2017  s21t-000-~     1     0     1     1     1     1     0     1
##  3 pre          2017  s21t-000-~     1     0     0     0     1     1     1     0
##  4 pre          2017  s21t-000-~     1     0     1     1     1     1     1     1
##  5 pre          2017  s21t-000-~     1     0     1     0     2     0     1     0
##  6 pre          2017  s21t-000-~     0     1     0     0     1     2     0     2
##  7 pre          2017  s21t-000-~     1     0     1     0     2     1     0     1
##  8 pre          2017  s21t-000-~     1     1     1     1     2     1     1     2
##  9 pre          2017  s21t-000-~     1     1     0     1     1     1     1     1
## 10 pre          2017  s21t-000-~     1     0     1     0     0     1     0     0
## # ... with 9,661 more rows, and 33 more variables: A9_B8 <dbl>, A10_B9 <dbl>,
## #   A11_B10 <dbl>, A12_B0 <dbl>, A13_B0 <dbl>, A14_B12 <dbl>, A15_B13 <dbl>,
## #   A16_B14 <dbl>, A17_B15 <dbl>, A18_B16 <dbl>, A19_B0 <dbl>, A20_B17 <dbl>,
## #   A21_B18 <dbl>, A22_B19 <dbl>, A23_B20 <dbl>, A24_B21 <dbl>, A25_B0 <dbl>,
## #   A26_B22 <dbl>, A27_B23 <dbl>, A28_B24 <dbl>, A29_B25 <dbl>, A30_B0 <dbl>,
## #   A31_B27 <dbl>, A32_B28 <dbl>, A33_B29 <dbl>, A34_B0 <dbl>, A35_B0 <dbl>,
## #   A36_B0 <dbl>, A0_B11 <dbl>, A0_B26 <dbl>, A0_B30 <dbl>, A0_B31 <dbl>, ...

For this analysis, we replace the “2 = I don’t know” scores with 0, reflecting a non-correct answer.

test_scores <- test_scores %>% 
  mutate(across(starts_with("A"), ~ ifelse(. == 2, 0, .)))

Data summary

The number of responses from each cohort:

test_scores %>% 
  group_by(year) %>% 
  tally() %>% 
  gt() %>% 
  data_color(
    columns = c("n"),
    colors = scales::col_numeric(palette = c("Blues"), domain = NULL)
  )
## Warning: The `.dots` argument of `group_by()` is deprecated as of dplyr 1.0.0.
year n
2017 1682
2018 1751
2019 1999
2020 2191
2021 2048

Mean and standard deviation for each item:

test_scores %>% 
  select(-class, -test_version) %>% 
  group_by(year) %>% 
  skim_without_charts() %>% 
  select(-contains("character."), -contains("numeric.p"), -skim_type) %>% 
  rename(complete = complete_rate) %>% 
  # make the table wider, i.e. with separate columns for each year's results, with the year at the start of each column name
  pivot_wider(names_from = year, values_from = -c(skim_variable, year), names_glue = "{year}__{.value}") %>% 
  # put the columns in order by year
  select(sort(names(.))) %>% 
  select(skim_variable, everything()) %>% 
  # use GT to make the table look nice
  gt(rowname_col = "skim_variable") %>% 
  # group the columns from each year
  tab_spanner_delim(delim = "__") %>%
  fmt_number(columns = contains("numeric"), decimals = 2) %>%
  fmt_percent(columns = contains("complete"), decimals = 0) %>% 
  # change all the numeric.mean and numeric.sd column names to Mean and SD
  cols_label(
    .list = test_scores %>% select(year) %>% distinct() %>% transmute(col = paste0(year, "__numeric.mean"), label = "Mean") %>% deframe()
  ) %>% 
  cols_label(
    .list = test_scores %>% select(year) %>% distinct() %>% transmute(col = paste0(year, "__numeric.sd"), label = "SD") %>% deframe()
  ) %>%
  data_color(
    columns = contains("numeric.mean"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  )
2017 2018 2019 2020 2021
complete n_missing Mean SD complete n_missing Mean SD complete n_missing Mean SD complete n_missing Mean SD complete n_missing Mean SD
A1_B1 100% 0 0.95 0.22 100% 0 0.95 0.21 100% 0 0.96 0.21 100% 0 0.96 0.20 100% 0 0.95 0.21
A2_B0 100% 0 0.31 0.46 100% 0 0.28 0.45 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
A3_B2 100% 0 0.62 0.48 100% 0 0.64 0.48 100% 0 0.67 0.47 100% 0 0.66 0.48 100% 0 0.67 0.47
A4_B3 100% 0 0.64 0.48 100% 0 0.62 0.48 100% 0 0.64 0.48 100% 0 0.62 0.48 100% 0 0.60 0.49
A5_B4 100% 0 0.47 0.50 100% 0 0.49 0.50 100% 0 0.49 0.50 100% 0 0.49 0.50 100% 0 0.49 0.50
A6_B5 100% 0 0.71 0.46 100% 0 0.73 0.44 100% 0 0.74 0.44 100% 0 0.76 0.42 100% 0 0.74 0.44
A7_B6 100% 0 0.71 0.46 100% 0 0.70 0.46 100% 0 0.69 0.46 100% 0 0.70 0.46 100% 0 0.68 0.47
A8_B7 100% 0 0.47 0.50 100% 0 0.50 0.50 100% 0 0.52 0.50 100% 0 0.60 0.49 100% 0 0.58 0.49
A9_B8 100% 0 0.46 0.50 100% 0 0.46 0.50 100% 0 0.46 0.50 100% 0 0.43 0.50 100% 0 0.42 0.49
A10_B9 100% 0 0.54 0.50 100% 0 0.54 0.50 100% 0 0.55 0.50 100% 0 0.55 0.50 100% 0 0.53 0.50
A11_B10 100% 0 0.57 0.49 100% 0 0.58 0.49 100% 0 0.58 0.49 100% 0 0.59 0.49 100% 0 0.57 0.50
A12_B0 100% 0 0.68 0.47 100% 0 0.71 0.45 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
A13_B0 100% 0 0.37 0.48 100% 0 0.36 0.48 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
A14_B12 100% 0 0.56 0.50 100% 0 0.57 0.49 100% 0 0.59 0.49 100% 0 0.54 0.50 100% 0 0.54 0.50
A15_B13 100% 0 0.46 0.50 100% 0 0.46 0.50 100% 0 0.47 0.50 100% 0 0.46 0.50 100% 0 0.45 0.50
A16_B14 100% 0 0.19 0.39 100% 0 0.20 0.40 100% 0 0.20 0.40 100% 0 0.20 0.40 100% 0 0.20 0.40
A17_B15 100% 0 0.59 0.49 100% 0 0.60 0.49 100% 0 0.60 0.49 100% 0 0.59 0.49 100% 0 0.57 0.49
A18_B16 100% 0 0.38 0.49 100% 0 0.36 0.48 100% 0 0.28 0.45 100% 0 0.27 0.45 100% 0 0.28 0.45
A19_B0 100% 0 0.35 0.48 100% 0 0.34 0.47 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
A20_B17 100% 0 0.72 0.45 100% 0 0.72 0.45 100% 0 0.74 0.44 100% 0 0.71 0.46 100% 0 0.71 0.45
A21_B18 100% 0 0.51 0.50 100% 0 0.52 0.50 100% 0 0.54 0.50 100% 0 0.52 0.50 100% 0 0.51 0.50
A22_B19 100% 0 0.52 0.50 100% 0 0.52 0.50 100% 0 0.52 0.50 100% 0 0.52 0.50 100% 0 0.50 0.50
A23_B20 100% 0 0.41 0.49 100% 0 0.43 0.50 100% 0 0.42 0.49 100% 0 0.42 0.49 100% 0 0.42 0.49
A24_B21 100% 0 0.59 0.49 100% 0 0.58 0.49 100% 0 0.60 0.49 100% 0 0.56 0.50 100% 0 0.57 0.50
A25_B0 100% 0 0.54 0.50 100% 0 0.73 0.44 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
A26_B22 100% 0 0.80 0.40 100% 0 0.80 0.40 100% 0 0.80 0.40 100% 0 0.81 0.39 100% 0 0.80 0.40
A27_B23 100% 0 0.39 0.49 100% 0 0.40 0.49 100% 0 0.42 0.49 100% 0 0.40 0.49 100% 0 0.41 0.49
A28_B24 100% 0 0.42 0.49 100% 0 0.46 0.50 100% 0 0.45 0.50 100% 0 0.42 0.49 100% 0 0.41 0.49
A29_B25 100% 0 0.48 0.50 100% 0 0.52 0.50 100% 0 0.51 0.50 100% 0 0.49 0.50 100% 0 0.50 0.50
A30_B0 100% 0 0.76 0.43 100% 0 0.73 0.44 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
A31_B27 100% 0 0.38 0.48 100% 0 0.37 0.48 100% 0 0.39 0.49 100% 0 0.40 0.49 100% 0 0.40 0.49
A32_B28 100% 0 0.23 0.42 100% 0 0.20 0.40 100% 0 0.20 0.40 100% 0 0.21 0.41 100% 0 0.20 0.40
A33_B29 100% 0 0.78 0.42 100% 0 0.76 0.43 100% 0 0.77 0.42 100% 0 0.74 0.44 100% 0 0.75 0.43
A34_B0 100% 0 0.58 0.49 100% 0 0.61 0.49 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
A35_B0 100% 0 0.32 0.47 100% 0 0.33 0.47 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
A36_B0 100% 0 0.22 0.41 100% 0 0.23 0.42 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
A0_B11 0% 1682 NaN NA 0% 1751 NaN NA 100% 0 0.57 0.50 100% 0 0.59 0.49 100% 0 0.58 0.49
A0_B26 0% 1682 NaN NA 0% 1751 NaN NA 100% 0 0.54 0.50 100% 0 0.53 0.50 100% 0 0.52 0.50
A0_B30 0% 1682 NaN NA 0% 1751 NaN NA 100% 0 0.61 0.49 100% 0 0.61 0.49 100% 0 0.60 0.49
A0_B31 0% 1682 NaN NA 0% 1751 NaN NA 100% 0 0.35 0.48 100% 0 0.33 0.47 100% 0 0.45 0.50
A0_B32 0% 1682 NaN NA 0% 1751 NaN NA 100% 0 0.20 0.40 100% 0 0.21 0.41 100% 0 0.21 0.41

2. Testing assumptions

Before applying IRT, we should check that the data satisfies the assumptions needed by the model. In particular, to use a 1-dimensional IRT model, we should have some evidence of unidimensionality in the test scores.

Since the combined data on the two versions of the test have large portions of “missing” data (e.g. no responses to new questions from students who completed the old test), it is not possible to carry out the factor analysis as in the analyse-test script, since the factor analysis routine does not work with missing data.

Instead, in the next section we proceed directly to fitting the IRT model, and using the \(Q_3\) check for local independence. In the final section, we also run a factor analysis for the data from the new test only.

TODO delete the rest of the contents of this section?!

Local independence

This plot shows the correlations between scores on each pair of items – note that it is restricted to only those items that appear on both versions of the test, since the plotting package did not deal well with missing data:

TODO - check with Mine that this is a valid way to proceed (throwing out questions with missing data)

item_scores <- test_scores %>% 
  select(-class, -year, -test_version)

item_scores_unchanged_only <- item_scores %>% 
  select(!contains("B0")) %>% select(!contains("A0"))

cor_ci <- psych::corCi(item_scores_unchanged_only, plot = FALSE)

psych::cor.plot.upperLowerCi(cor_ci)

TODO - add some notes here to explain how to interpret all of this!

There are a few correlations that are not significantly different from 0:

cor_ci$ci %>% 
  as_tibble(rownames = "corr") %>% 
  filter(p > 0.05) %>% 
  arrange(-p) %>% 
  select(-contains(".e")) %>% 
  gt() %>% 
  fmt_number(columns = 2:4, decimals = 3)
corr lower upper p

Here we redo the correlation calculations with all the items, and check that there are still few cases where the correlations close to 0:

cor_ci2 <- psych::corCi(item_scores, plot = FALSE)
cor_ci2$ci %>% 
  as_tibble(rownames = "corr") %>% 
  filter(p > 0.05) %>% 
  arrange(-p) %>% 
  select(-contains(".e")) %>% 
  gt() %>% 
  fmt_number(columns = 2:4, decimals = 3)
corr lower upper p
A1_B1-A2_B0 −0.006 0.055 0.115

The overall picture is that the item scores are well correlated with each other.

Dimensionality

Here we again focus on the subset of items that appeared in both tests.

structure <- check_factorstructure(item_scores_unchanged_only)
n <- n_factors(item_scores_unchanged_only)

Is the data suitable for Factor Analysis?

  • KMO: The Kaiser, Meyer, Olkin (KMO) measure of sampling adequacy suggests that data seems appropriate for factor analysis (KMO = 0.97).
  • Sphericity: Bartlett’s test of sphericity suggests that there is sufficient significant correlation in the data for factor analysis (Chisq(351) = 61172.10, p < .001).

Method Agreement Procedure:

The choice of 1 dimensions is supported by 6 (26.09%) methods out of 23 (Acceleration factor, R2, VSS complexity 1, Velicer’s MAP, TLI, RMSEA).

plot(n)

summary(n) %>% gt()
n_Factors n_Methods
1 6
2 1
3 5
4 1
6 2
7 1
10 2
16 1
18 1
25 2
26 1
#n %>% tibble() %>% gt()
fa.parallel(item_scores_unchanged_only, fa = "fa")

## Parallel analysis suggests that the number of factors =  7  and the number of components =  NA

1 Factor

item_scores_unchanged_only_and_no_na <- item_scores_unchanged_only %>%
  mutate(
    across(everything(), ~replace_na(.x, 0))
  )
fitfact <- factanal(item_scores_unchanged_only_and_no_na, factors = 1, rotation = "varimax")
print(fitfact, digits = 2, cutoff = 0.3, sort = TRUE)
## 
## Call:
## factanal(x = item_scores_unchanged_only_and_no_na, factors = 1,     rotation = "varimax")
## 
## Uniquenesses:
##   A1_B1   A3_B2   A4_B3   A5_B4   A6_B5   A7_B6   A8_B7   A9_B8  A10_B9 A11_B10 
##    0.95    0.77    0.80    0.72    0.76    0.77    0.78    0.62    0.65    0.69 
## A14_B12 A15_B13 A16_B14 A17_B15 A18_B16 A20_B17 A21_B18 A22_B19 A23_B20 A24_B21 
##    0.71    0.88    0.82    0.79    0.84    0.64    0.58    0.61    0.62    0.74 
## A26_B22 A27_B23 A28_B24 A29_B25 A31_B27 A32_B28 A33_B29 
##    0.70    0.64    0.64    0.87    0.77    0.80    0.88 
## 
## Loadings:
##  [1] 0.53 0.62 0.59 0.56 0.53 0.60 0.64 0.63 0.61 0.51 0.55 0.60 0.60      0.48
## [16] 0.45 0.49 0.48 0.47 0.35 0.42 0.46 0.40 0.36 0.48 0.45 0.35
## 
##                Factor1
## SS loadings       6.95
## Proportion Var    0.26
## 
## Test of the hypothesis that 1 factor is sufficient.
## The chi square statistic is 4491.49 on 324 degrees of freedom.
## The p-value is 0
load <- tidy(fitfact)

ggplot(load, aes(x = fl1, y = 0)) + 
  geom_point() + 
  geom_label_repel(aes(label = paste0("A", rownames(load))), show.legend = FALSE) +
  labs(x = "Factor 1", y = NULL,
       title = "Standardised Loadings", 
       subtitle = "Based upon correlation matrix") +
  theme_minimal()
## Warning: ggrepel: 8 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

load %>% 
  select(question = variable, factor_loading = fl1) %>% 
  left_join(test_versions %>% select(question = label, description), by = "question") %>% 
  arrange(-factor_loading) %>% 
  gt() %>%
  data_color(
    columns = contains("factor"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  )
question factor_loading description
A21_B18 0.6443385 (ln(sin))'
A22_B19 0.6263125 hyp.functions
A9_B8 0.6172979 trig.ineq.
A23_B20 0.6143346 slope tangent
A27_B23 0.6004427 int(exp)
A20_B17 0.6001530 (exp)'
A28_B24 0.5967679 Int = 0
A10_B9 0.5894189 trig.identity
A11_B10 0.5590591 period
A26_B22 0.5519692 int(poly)
A14_B12 0.5343762 limit
A5_B4 0.5264209 logs
A24_B21 0.5088626 IVT
A6_B5 0.4879943 logs
A3_B2 0.4842855 arithmetic rules
A7_B6 0.4775373 graph translation
A31_B27 0.4754865 int(abs)
A8_B7 0.4661955 sine pi/3
A17_B15 0.4633252 graph f'
A32_B28 0.4519476 FtoC algebra
A4_B3 0.4494958 Easy ineq.
A16_B14 0.4230310 diff.quotient
A18_B16 0.3965997 product rule
A29_B25 0.3592935 int even funct
A33_B29 0.3528981 difference vectors
A15_B13 0.3470535 series
A1_B1 0.2246966 linear function

3 Factor

Here we also investigate the 3-factor solution, to see whether these factors are interpretable.

fitfact3 <- factanal(item_scores_unchanged_only_and_no_na, factors = 3, rotation = "varimax")
print(fitfact3, digits = 2, cutoff = 0.3, sort = TRUE)
## 
## Call:
## factanal(x = item_scores_unchanged_only_and_no_na, factors = 3,     rotation = "varimax")
## 
## Uniquenesses:
##   A1_B1   A3_B2   A4_B3   A5_B4   A6_B5   A7_B6   A8_B7   A9_B8  A10_B9 A11_B10 
##    0.90    0.77    0.76    0.72    0.76    0.72    0.77    0.57    0.65    0.67 
## A14_B12 A15_B13 A16_B14 A17_B15 A18_B16 A20_B17 A21_B18 A22_B19 A23_B20 A24_B21 
##    0.71    0.86    0.74    0.72    0.83    0.51    0.53    0.58    0.62    0.67 
## A26_B22 A27_B23 A28_B24 A29_B25 A31_B27 A32_B28 A33_B29 
##    0.58    0.61    0.64    0.84    0.74    0.76    0.82 
## 
## Loadings:
##         Factor1 Factor2 Factor3
## A24_B21         0.50           
## A20_B17                 0.61   
## A21_B18 0.43            0.50   
## A26_B22         0.32    0.54   
## A1_B1                          
## A3_B2   0.30    0.30           
## A4_B3           0.39           
## A5_B4   0.41                   
## A6_B5           0.30           
## A7_B6           0.45           
## A8_B7   0.38                   
## A9_B8   0.50    0.41           
## A10_B9  0.45                   
## A11_B10 0.36    0.39           
## A14_B12 0.39                   
## A15_B13                        
## A16_B14 0.48                   
## A17_B15         0.47           
## A18_B16 0.32                   
## A22_B19 0.43            0.43   
## A23_B20 0.45    0.31           
## A27_B23 0.43            0.41   
## A28_B24 0.44    0.33           
## A29_B25                 0.33   
## A31_B27 0.37    0.33           
## A32_B28 0.42                   
## A33_B29         0.38           
## 
##                Factor1 Factor2 Factor3
## SS loadings       3.21    2.55    2.20
## Proportion Var    0.12    0.09    0.08
## Cumulative Var    0.12    0.21    0.29
## 
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 1618.95 on 273 degrees of freedom.
## The p-value is 1.24e-189
load3 <- tidy(fitfact3)

ggplot(load3, aes(x = fl1, y = fl2)) + 
  geom_point() + 
  geom_label_repel(aes(label = paste0("A", rownames(load))), show.legend = FALSE) +
  labs(x = "Factor 1", y = "Factor 2",
       title = "Standardised Loadings", 
       subtitle = "Based upon correlation matrix") +
  theme_minimal()

main_factors <- load3 %>% 
#  mutate(factorNone = 0.4) %>%  # add this to set the main factor to "None" where all loadings are below 0.4
  pivot_longer(names_to = "factor",
               cols = contains("fl")) %>% 
  mutate(value_abs = abs(value)) %>% 
  group_by(variable) %>% 
  top_n(1, value_abs) %>% 
  ungroup() %>% 
  transmute(main_factor = factor, variable)

library(kableExtra)
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
load3 %>% 
  select(-uniqueness) %>% 
  # add the info about which is the main factor
  left_join(main_factors) %>%
  left_join(test_versions %>% select(variable = label, description)) %>% 
  arrange(main_factor) %>% 
  select(main_factor, everything()) %>% 
  # arrange adjectives by descending loading on main factor
  rowwise() %>% 
  mutate(max_loading = max(abs(c_across(starts_with("fl"))))) %>% 
  group_by(main_factor) %>% 
  arrange(-max_loading, .by_group = TRUE) %>% 
  select(-max_loading) %>% 
  # sort out the presentation
  rename("Main Factor" = main_factor, # the _ throws a latex error
         "Question" = variable) %>%
  mutate_at(
    vars(starts_with("fl")),
    ~ cell_spec(round(., digits = 3), bold = if_else(abs(.) > 0.4, T, F))
  ) %>% 
  kable(booktabs = T, escape = F, longtable = T) %>% 
  kableExtra::collapse_rows(columns = 1, valign = "top") %>%
  kableExtra::kable_styling(latex_options = c("repeat_header"))
## Joining, by = "variable"
## Joining, by = "variable"
Main Factor Question fl1 fl2 fl3 description
fl1 A9_B8 0.499 0.408 0.135 trig.ineq.
A16_B14 0.482 0.053 0.156 diff.quotient
A10_B9 0.453 0.288 0.257 trig.identity
A23_B20 0.447 0.31 0.289 slope tangent
A28_B24 0.438 0.33 0.246 Int = 0
A27_B23 0.427 0.2 0.408 int(exp)
A32_B28 0.424 0.14 0.188 FtoC algebra
A5_B4 0.406 0.26 0.224 logs
A14_B12 0.388 0.232 0.293 limit
A8_B7 0.382 0.231 0.171 sine pi/3
A31_B27 0.375 0.327 0.1 int(abs)
A18_B16 0.32 0.144 0.207 product rule
fl2 A24_B21 0.253 0.501 0.141 IVT
A17_B15 0.199 0.47 0.152 graph f’
A7_B6 0.187 0.451 0.213 graph translation
A11_B10 0.364 0.394 0.201 period
A4_B3 0.273 0.393 0.107 Easy ineq.
A33_B29 0.114 0.381 0.136 difference vectors
A3_B2 0.301 0.301 0.231 arithmetic rules
A6_B5 0.262 0.301 0.287 logs
A15_B13 0.241 0.279 0.07 series
A1_B1 0.003 0.278 0.136 linear function
fl3 A20_B17 0.236 0.256 0.607 (exp)’
A26_B22 0.157 0.319 0.541 int(poly)
A21_B18 0.434 0.19 0.501 (ln(sin))’
A22_B19 0.429 0.224 0.431 hyp.functions
A29_B25 0.201 0.1 0.332 int even funct

3. Fitting 2 parameter logistic MIRT model

We can fit a Multidimensional Item Response Theory (mirt) model. From the function definition:

mirt fits a maximum likelihood (or maximum a posteriori) factor analysis model to any mixture of dichotomous and polytomous data under the item response theory paradigm using either Cai's (2010) Metropolis-Hastings Robbins-Monro (MHRM) algorithm.

The process is to first fit the model, and save the result as a model object that we can then parse to get tabular or visual displays of the model that we might want. When fitting the model, we have the option to specify a few arguments, which then get interpreted as parameters to be passed to the model.

fit_2pl <- mirt(
  data = item_scores, # just the columns with question scores
  #removeEmptyRows = TRUE, 
  model = 1,          # number of factors to extract
  itemtype = "2PL",   # 2 parameter logistic model
  SE = TRUE           # estimate standard errors
  )
## 
Iteration: 1, Log-Lik: -186638.319, Max-Change: 4.61593
Iteration: 2, Log-Lik: -173222.852, Max-Change: 3.34257
Iteration: 3, Log-Lik: -172025.296, Max-Change: 0.75461
Iteration: 4, Log-Lik: -171559.893, Max-Change: 0.30299
Iteration: 5, Log-Lik: -171310.677, Max-Change: 0.29565
Iteration: 6, Log-Lik: -171163.155, Max-Change: 0.15660
Iteration: 7, Log-Lik: -171055.541, Max-Change: 0.15756
Iteration: 8, Log-Lik: -170980.492, Max-Change: 0.10202
Iteration: 9, Log-Lik: -170924.986, Max-Change: 0.13257
Iteration: 10, Log-Lik: -170881.349, Max-Change: 0.08812
Iteration: 11, Log-Lik: -170846.140, Max-Change: 0.07497
Iteration: 12, Log-Lik: -170823.782, Max-Change: 0.04927
Iteration: 13, Log-Lik: -170807.153, Max-Change: 0.06731
Iteration: 14, Log-Lik: -170793.659, Max-Change: 0.03988
Iteration: 15, Log-Lik: -170783.377, Max-Change: 0.04419
Iteration: 16, Log-Lik: -170775.383, Max-Change: 0.02498
Iteration: 17, Log-Lik: -170768.350, Max-Change: 0.03335
Iteration: 18, Log-Lik: -170762.835, Max-Change: 0.01562
Iteration: 19, Log-Lik: -170757.759, Max-Change: 0.02149
Iteration: 20, Log-Lik: -170753.836, Max-Change: 0.01406
Iteration: 21, Log-Lik: -170750.152, Max-Change: 0.01118
Iteration: 22, Log-Lik: -170744.133, Max-Change: 0.01296
Iteration: 23, Log-Lik: -170741.889, Max-Change: 0.01003
Iteration: 24, Log-Lik: -170740.015, Max-Change: 0.00919
Iteration: 25, Log-Lik: -170732.968, Max-Change: 0.00538
Iteration: 26, Log-Lik: -170732.507, Max-Change: 0.00457
Iteration: 27, Log-Lik: -170732.168, Max-Change: 0.00410
Iteration: 28, Log-Lik: -170730.815, Max-Change: 0.00252
Iteration: 29, Log-Lik: -170730.746, Max-Change: 0.00278
Iteration: 30, Log-Lik: -170730.686, Max-Change: 0.00187
Iteration: 31, Log-Lik: -170730.559, Max-Change: 0.00125
Iteration: 32, Log-Lik: -170730.538, Max-Change: 0.00104
Iteration: 33, Log-Lik: -170730.519, Max-Change: 0.00099
Iteration: 34, Log-Lik: -170730.435, Max-Change: 0.00068
Iteration: 35, Log-Lik: -170730.427, Max-Change: 0.00035
Iteration: 36, Log-Lik: -170730.424, Max-Change: 0.00018
Iteration: 37, Log-Lik: -170730.419, Max-Change: 0.00019
Iteration: 38, Log-Lik: -170730.417, Max-Change: 0.00018
Iteration: 39, Log-Lik: -170730.416, Max-Change: 0.00016
Iteration: 40, Log-Lik: -170730.408, Max-Change: 0.00013
Iteration: 41, Log-Lik: -170730.408, Max-Change: 0.00013
Iteration: 42, Log-Lik: -170730.407, Max-Change: 0.00013
Iteration: 43, Log-Lik: -170730.403, Max-Change: 0.00056
Iteration: 44, Log-Lik: -170730.401, Max-Change: 0.00046
Iteration: 45, Log-Lik: -170730.399, Max-Change: 0.00046
Iteration: 46, Log-Lik: -170730.398, Max-Change: 0.00029
Iteration: 47, Log-Lik: -170730.397, Max-Change: 0.00011
Iteration: 48, Log-Lik: -170730.397, Max-Change: 0.00005
## 
## Calculating information matrix...

Local independence

We compute Yen’s \(Q_3\) (1984) to check for any dependence between items after controlling for \(\theta\). This gives a score for each pair of items, with scores above 0.2 regarded as problematic (see DeMars, p. 48).

residuals_2pl  %>% as.matrix() %>% 
  corrplot::corrplot(type = "upper")

This shows that most item pairs are independent, with only a couple of pairs showing cause for concern:

residuals_2pl %>%
  rownames_to_column(var = "q1") %>%
  as_tibble() %>% 
  pivot_longer(cols = starts_with("A"), names_to = "q2", values_to = "Q3_score") %>% 
  filter(abs(Q3_score) > 0.2) %>% 
  filter(parse_number(q1) < parse_number(q2)) %>%
  gt()
q1 q2 Q3_score
A18_B16 A19_B0 0.6739582
A34_B0 A35_B0 0.2105944

In fact, both of these pairs highlight questions that were removed from the test:

  • A18 and A19 are on the product and quotient rules, and only A18 was retained on the new test,

  • A34 and A35 are both about vectors; only A34 was retained (in modified form, as B30)

Given that this violation of the local independence assumption is very mild, we proceed using this model.

Model parameters

We then compute factor score estimates and augment the existing data frame with these estimates, to keep everything in one place. To do the estimation, we use the fscores() function from the mirt package which takes in a computed model object and computes factor score estimates according to the method specified. We will use the EAP method for factor score estimation, which is the “expected a-posteriori” method, the default. We specify it explicitly below, but the results would have been the same if we omitted specifying the method argument since it’s the default method the function uses.

test_scores <- test_scores %>%
  mutate(F1 = fscores(fit_2pl, method = "EAP"))

We can also calculate the model coefficient estimates using a generic function coef() which is used to extract model coefficients from objects returned by modeling functions. We will set the IRTpars argument to TRUE, which means slope intercept parameters will be converted into traditional IRT parameters.

coefs_2pl <- coef(fit_2pl, IRTpars = TRUE)

The resulting object coefs is a list, with one element for each question, and an additional GroupPars element that we won’t be using. The output is a bit long, so we’re only showing a few of the elements here:

coefs_2pl[1:3]
## $A1_B1
##                a         b  g  u
## par     1.260879 -2.917451  0  1
## CI_2.5  1.137955 -3.127792 NA NA
## CI_97.5 1.383803 -2.707110 NA NA
## 
## $A2_B0
##                 a        b  g  u
## par     0.5536636 1.690986  0  1
## CI_2.5  0.4694646 1.425705 NA NA
## CI_97.5 0.6378627 1.956267 NA NA
## 
## $A3_B2
##                a          b  g  u
## par     1.294505 -0.6424149  0  1
## CI_2.5  1.225025 -0.6896012 NA NA
## CI_97.5 1.363985 -0.5952286 NA NA
# coefs_2pl[35:37]

Let’s take a closer look at the first element:

coefs_2pl[1]
## $A1_B1
##                a         b  g  u
## par     1.260879 -2.917451  0  1
## CI_2.5  1.137955 -3.127792 NA NA
## CI_97.5 1.383803 -2.707110 NA NA

In this output:

  • a is discrimination
  • b is difficulty
  • endpoints of the 95% confidence intervals are also shown

To make this output a little more user friendly, we should tidy it such that we have a row per question. We’ll do this in two steps. First, write a function that tidies the output for one question, i.e. one list element. Then, map this function over the list of all questions, resulting in a data frame.

tidy_mirt_coefs <- function(x){
  x %>%
    # melt the list element
    melt() %>%
    # convert to a tibble
    as_tibble() %>%
    # convert factors to characters
    mutate(across(where(is.factor), as.character)) %>%
    # only focus on rows where X2 is a or b (discrimination or difficulty)
    filter(X2 %in% c("a", "b")) %>%
    # in X1, relabel par (parameter) as est (estimate)
    mutate(X1 = if_else(X1 == "par", "est", X1)) %>%
    # unite columns X2 and X1 into a new column called var separated by _
    unite(X2, X1, col = "var", sep = "_") %>%
    # turn into a wider data frame
    pivot_wider(names_from = var, values_from = value)
}

Let’s see what this does to a single element in coefs:

tidy_mirt_coefs(coefs_2pl[1])
## # A tibble: 1 x 7
##   L1    a_est a_CI_2.5 a_CI_97.5 b_est b_CI_2.5 b_CI_97.5
##   <chr> <dbl>    <dbl>     <dbl> <dbl>    <dbl>     <dbl>
## 1 A1_B1  1.26     1.14      1.38 -2.92    -3.13     -2.71

And now let’s map it over all elements of coefs:

# use head(., -1) to remove the last element, `GroupPars`, which does not correspond to a question
tidy_2pl <- map_dfr(head(coefs_2pl, -1), tidy_mirt_coefs, .id = "Question") %>% 
  left_join(test_versions, by = c("Question" = "label"))

A quick peek at the result:

tidy_2pl
## # A tibble: 41 x 12
##    Question a_est a_CI_2.5 a_CI_97.5   b_est b_CI_2.5 b_CI_97.5 pre   post 
##    <glue>   <dbl>    <dbl>     <dbl>   <dbl>    <dbl>     <dbl> <chr> <chr>
##  1 A1_B1    1.26     1.14      1.38  -2.92    -3.13     -2.71   A1    B1   
##  2 A2_B0    0.554    0.469     0.638  1.69     1.43      1.96   A2    <NA> 
##  3 A3_B2    1.29     1.23      1.36  -0.642   -0.690    -0.595  A3    B2   
##  4 A4_B3    1.15     1.09      1.21  -0.568   -0.618    -0.519  A4    B3   
##  5 A5_B4    1.43     1.35      1.50   0.0508   0.0122    0.0894 A5    B4   
##  6 A6_B5    1.51     1.43      1.60  -0.967   -1.02     -0.917  A6    B5   
##  7 A7_B6    1.37     1.30      1.44  -0.817   -0.866    -0.767  A7    B6   
##  8 A8_B7    1.18     1.11      1.24  -0.176   -0.220    -0.132  A8    B7   
##  9 A9_B8    1.98     1.89      2.08   0.172    0.139     0.205  A9    B8   
## 10 A10_B9   1.74     1.66      1.83  -0.156   -0.191    -0.121  A10   B9   
## # ... with 31 more rows, and 3 more variables: description <chr>, notes <chr>,
## #   outcome <chr>

And a nicely formatted table of the result:

tidy_2pl %>%
  select(-pre,-post,-notes) %>% 
  group_by(outcome) %>% 
  gt() %>% 
  fmt_number(columns = contains("a_"), decimals = 2) %>%
  fmt_number(columns = contains("b_"), decimals = 2) %>%
  data_color(
    columns = contains("a_"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  data_color(
    columns = contains("b_"),
    colors = scales::col_numeric(palette = c("Blues"), domain = NULL)
  ) %>%
  tab_spanner(label = "Discrimination", columns = contains("a_")) %>%
  tab_spanner(label = "Difficulty", columns = contains("b_")) %>%
  cols_label(
    a_est = "Est.",
    b_est = "Est.",
    a_CI_2.5 = "2.5%",
    b_CI_2.5 = "2.5%",
    a_CI_97.5 = "97.5%",
    b_CI_97.5 = "97.5%"
  )
Question Discrimination Difficulty description
Est. 2.5% 97.5% Est. 2.5% 97.5%
unchanged
A1_B1 1.26 1.14 1.38 −2.92 −3.13 −2.71 linear function
A3_B2 1.29 1.23 1.36 −0.64 −0.69 −0.60 arithmetic rules
A4_B3 1.15 1.09 1.21 −0.57 −0.62 −0.52 Easy ineq.
A5_B4 1.43 1.35 1.50 0.05 0.01 0.09 logs
A6_B5 1.51 1.43 1.60 −0.97 −1.02 −0.92 logs
A7_B6 1.37 1.30 1.44 −0.82 −0.87 −0.77 graph translation
A8_B7 1.18 1.11 1.24 −0.18 −0.22 −0.13 sine pi/3
A9_B8 1.98 1.89 2.08 0.17 0.14 0.20 trig.ineq.
A10_B9 1.74 1.66 1.83 −0.16 −0.19 −0.12 trig.identity
A11_B10 1.61 1.53 1.69 −0.30 −0.34 −0.26 period
A14_B12 1.47 1.40 1.55 −0.23 −0.27 −0.19 limit
A15_B13 0.83 0.78 0.88 0.21 0.16 0.27 series
A16_B14 1.41 1.32 1.49 1.34 1.28 1.41 diff.quotient
A17_B15 1.22 1.15 1.28 −0.39 −0.43 −0.34 graph f'
A18_B16 1.08 1.02 1.15 0.91 0.85 0.97 product rule
A20_B17 2.18 2.06 2.29 −0.75 −0.78 −0.71 (exp)'
A21_B18 2.10 2.00 2.20 −0.07 −0.10 −0.03 (ln(sin))'
A22_B19 1.99 1.90 2.09 −0.05 −0.08 −0.01 hyp.functions
A23_B20 1.94 1.85 2.03 0.26 0.23 0.30 slope tangent
A24_B21 1.41 1.34 1.49 −0.31 −0.35 −0.27 IVT
A26_B22 2.38 2.25 2.51 −1.06 −1.10 −1.02 int(poly)
A27_B23 1.89 1.80 1.99 0.32 0.29 0.36 int(exp)
A28_B24 1.85 1.76 1.94 0.23 0.20 0.27 Int = 0
A29_B25 0.83 0.78 0.88 0.00 −0.05 0.06 int even funct
A31_B27 1.29 1.23 1.36 0.45 0.40 0.49 int(abs)
A32_B28 1.52 1.44 1.61 1.22 1.16 1.28 FtoC algebra
A33_B29 0.99 0.93 1.06 −1.37 −1.46 −1.29 difference vectors
removed
A2_B0 0.55 0.47 0.64 1.69 1.43 1.96 3D
A12_B0 0.88 0.78 0.97 −1.11 −1.24 −0.98 rational exponents
A13_B0 0.71 0.62 0.79 0.86 0.73 1.00 ellipsoid
A19_B0 1.27 1.16 1.38 0.65 0.58 0.73 quotient rule
A25_B0 0.53 0.45 0.61 −1.14 −1.35 −0.94 velocity
A30_B0 1.25 1.13 1.37 −1.10 −1.20 −1.00 FtoC graph
A34_B0 0.89 0.80 0.99 −0.52 −0.62 −0.43 normal vector
A35_B0 1.26 1.15 1.37 0.74 0.66 0.82 intersect planes
A36_B0 1.23 1.11 1.35 1.30 1.19 1.41 parallel planes
added
A0_B11 0.77 0.70 0.84 −0.48 −0.56 −0.40 rational exponents
A0_B26 1.08 1.01 1.16 −0.15 −0.20 −0.09 FtoC graph
A0_B30 0.81 0.75 0.88 −0.59 −0.67 −0.51 normal vector
A0_B31 1.01 0.94 1.09 0.62 0.55 0.69 vector product
A0_B32 1.32 1.22 1.41 1.32 1.24 1.41 scalar product
tidy_2pl %>% 
  write_csv("data-eth/OUT_2pl-results.csv")
tidy_2pl %>% 
  mutate(qnum = parse_number(Question)) %>% 
  ggplot(aes(x = qnum, y = b_est, label = Question)) +
  geom_errorbar(aes(ymin = b_CI_2.5, ymax = b_CI_97.5), width = 0.2) +
  geom_text(colour = "grey") +
  geom_point() +
  theme_minimal() +
  labs(x = "Question",
       y = "Difficulty")

This shows the difficulty and discrimination of each of the questions, highlighting those that were added or removed:

tidy_2pl %>% 
  mutate(qnum = parse_number(Question)) %>% 
  ggplot(aes(
    x = a_est,
    y = b_est,
    label = case_when(
      outcome == "unchanged" ~ "",
      outcome == "removed" ~ pre,
      outcome == "added" ~ post
    ),
    colour = outcome
  )) +
  geom_errorbar(aes(ymin = b_CI_2.5, ymax = b_CI_97.5), width = 0, alpha = 0.5) +
  geom_errorbar(aes(xmin = a_CI_2.5, xmax = a_CI_97.5), width = 0, alpha = 0.5) +
  geom_text_repel() +
  geom_point() +
  theme_minimal() +
  labs(x = "Discrimination",
       y = "Difficulty")

Comparing years and classes

Do students from different programmes of study have different distributions of ability?

Differences between years

Compare the distribution of abilities in the year groups (though in this case there is only one).

ggplot(test_scores, aes(F1, y = year, fill = as.factor(year), colour = as.factor(year))) +
  geom_density_ridges(alpha=0.5) + 
  scale_x_continuous(limits = c(-3.5,3.5)) +
  labs(title = "Density plot", 
       subtitle = "Ability grouped by year of taking the test", 
       x = "Ability", y = "Density",
       fill = "Year", colour = "Year") +
  theme_minimal()
## Picking joint bandwidth of 0.189

Information curves

Test information curve

plot(fit_2pl, type = "infoSE", main = "Test information")

Item information curves

plot(fit_2pl, type = "infotrace", main = "Item information curves")

Response curves

Test response curves

plot(fit_2pl, type = "score", auto.key = FALSE)

Item response curves

We can get individual item surface and information plots using the itemplot() function from the mirt package, e.g.

mirt::itemplot(fit_2pl, item = 1, 
               main = "Trace lines for item 1")

We can also get the plots for all trace lines, one facet per plot.

plot(fit_2pl, type = "trace", auto.key = FALSE)

Or all of them overlaid in one plot.

plot(fit_2pl, type = "trace", facet_items=FALSE)

An alternative approach is using ggplot2 and plotly to add interactivity to make it easier to identify items.

# store the object
plt <- plot(fit_2pl, type = "trace", facet_items = FALSE)
# the data we need is in panel.args
# TODO - I had to change the [[1]] to [[2]] since the plt has two panels for some reason, with the one we want being the 2nd panel
plt_data <- tibble(
  x          = plt$panel.args[[2]]$x,
  y          = plt$panel.args[[2]]$y,
  subscripts = plt$panel.args[[2]]$subscripts,
  item       = rep(colnames(item_scores), each = 200)
) %>%
  mutate(
    item_no = str_remove(item, "A") %>% as.numeric(),
    item    = fct_reorder(item, item_no)
    )

head(plt_data)
## # A tibble: 6 x 5
##       x      y subscripts item  item_no
##   <dbl>  <dbl>      <int> <fct>   <dbl>
## 1 -6    0.0201        201 A1_B1      NA
## 2 -5.94 0.0217        202 A1_B1      NA
## 3 -5.88 0.0233        203 A1_B1      NA
## 4 -5.82 0.0251        204 A1_B1      NA
## 5 -5.76 0.0271        205 A1_B1      NA
## 6 -5.70 0.0291        206 A1_B1      NA
plt_gg <- ggplot(plt_data, aes(x, y, 
                          colour = item, 
                          text = item)) + 
  geom_line() + 
  labs(
    title = "2PL - Trace lines",
    #x = expression(theta),
    x = "theta",
    #y = expression(P(theta)),
    y = "P(theta)",
    colour = "Item"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

ggplotly(plt_gg, tooltip = "text")

4. Factor analysis for the new test only

Here we redo the factor analysis, but using only the data from the new version of the test

item_scores_B <- test_scores %>% 
  select(-F1) %>% 
  select(-contains("B0")) %>% 
  filter(test_version == "post") %>% 
  select(-class, -year, -test_version)
structure <- check_factorstructure(item_scores_B)
n <- n_factors(item_scores_B)

Is the data suitable for Factor Analysis?

  • KMO: The Kaiser, Meyer, Olkin (KMO) measure of sampling adequacy suggests that data seems appropriate for factor analysis (KMO = 0.97).
    • Sphericity: Bartlett’s test of sphericity suggests that there is sufficient significant correlation in the data for factor analysis (Chisq(496) = 44502.66, p < .001).

Method Agreement Procedure:

The choice of 1 dimensions is supported by 6 (26.09%) methods out of 23 (Acceleration factor, R2, VSS complexity 1, Velicer’s MAP, TLI, RMSEA).

plot(n)

summary(n) %>% gt()
n_Factors n_Methods
1 6
2 1
3 2
4 5
5 2
6 1
12 2
24 1
29 3
#n %>% tibble() %>% gt()
fa.parallel(item_scores_B, fa = "fa")

## Parallel analysis suggests that the number of factors =  7  and the number of components =  NA

1 Factor

fitfact <- factanal(item_scores_B, factors = 1, rotation = "varimax")
print(fitfact, digits = 2, cutoff = 0.3, sort = TRUE)
## 
## Call:
## factanal(x = item_scores_B, factors = 1, rotation = "varimax")
## 
## Uniquenesses:
##   A1_B1   A3_B2   A4_B3   A5_B4   A6_B5   A7_B6   A8_B7   A9_B8  A10_B9 A11_B10 
##    0.95    0.78    0.80    0.73    0.77    0.76    0.78    0.62    0.66    0.70 
## A14_B12 A15_B13 A16_B14 A17_B15 A18_B16 A20_B17 A21_B18 A22_B19 A23_B20 A24_B21 
##    0.71    0.88    0.83    0.78    0.86    0.66    0.60    0.61    0.62    0.73 
## A26_B22 A27_B23 A28_B24 A29_B25 A31_B27 A32_B28 A33_B29  A0_B11  A0_B26  A0_B30 
##    0.70    0.65    0.64    0.88    0.76    0.80    0.86    0.89    0.81    0.88 
##  A0_B31  A0_B32 
##    0.84    0.83 
## 
## Loadings:
##  [1] 0.52 0.62 0.58 0.55 0.54 0.58 0.63 0.62 0.62 0.52 0.54 0.59 0.60      0.47
## [16] 0.45 0.48 0.49 0.47 0.35 0.41 0.47 0.38 0.34 0.49 0.45 0.37 0.33 0.44 0.34
## [31] 0.41 0.41
## 
##                Factor1
## SS loadings       7.63
## Proportion Var    0.24
## 
## Test of the hypothesis that 1 factor is sufficient.
## The chi square statistic is 3991.49 on 464 degrees of freedom.
## The p-value is 0
load <- tidy(fitfact)

ggplot(load, aes(x = fl1, y = 0)) + 
  geom_point() + 
  geom_label_repel(aes(label = paste0("A", rownames(load))), show.legend = FALSE) +
  labs(x = "Factor 1", y = NULL,
       title = "Standardised Loadings", 
       subtitle = "Based upon correlation matrix") +
  theme_minimal()
## Warning: ggrepel: 3 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

load %>% 
  select(question = variable, factor_loading = fl1) %>% 
  left_join(test_versions %>% select(question = label, description), by = "question") %>% 
  arrange(-factor_loading) %>% 
  gt() %>%
  data_color(
    columns = contains("factor"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  )
question factor_loading description
A21_B18 0.6335602 (ln(sin))'
A22_B19 0.6229067 hyp.functions
A9_B8 0.6173382 trig.ineq.
A23_B20 0.6167307 slope tangent
A28_B24 0.5989062 Int = 0
A27_B23 0.5934801 int(exp)
A20_B17 0.5839634 (exp)'
A10_B9 0.5823615 trig.identity
A11_B10 0.5489394 period
A26_B22 0.5440885 int(poly)
A14_B12 0.5367772 limit
A5_B4 0.5215381 logs
A24_B21 0.5153658 IVT
A31_B27 0.4918262 int(abs)
A7_B6 0.4910048 graph translation
A6_B5 0.4757299 logs
A3_B2 0.4702819 arithmetic rules
A8_B7 0.4697829 sine pi/3
A17_B15 0.4670353 graph f'
A32_B28 0.4470902 FtoC algebra
A4_B3 0.4454821 Easy ineq.
A0_B26 0.4373808 FtoC graph
A16_B14 0.4142581 diff.quotient
A0_B32 0.4094848 scalar product
A0_B31 0.4058509 vector product
A18_B16 0.3802267 product rule
A33_B29 0.3700329 difference vectors
A15_B13 0.3531196 series
A0_B30 0.3422352 normal vector
A29_B25 0.3420163 int even funct
A0_B11 0.3327412 rational exponents
A1_B1 0.2246888 linear function

The questions that load most strongly on this factor are very standard calculations in integration, differentiation, and trigonometry – which is consistent with the aim of the test.

4 Factor

Here we also investigate the 4-factor solution, to see whether these factors are interpretable.

fitfact4 <- factanal(item_scores_B, factors = 4, rotation = "varimax")
print(fitfact4, digits = 2, cutoff = 0.3, sort = TRUE)
## 
## Call:
## factanal(x = item_scores_B, factors = 4, rotation = "varimax")
## 
## Uniquenesses:
##   A1_B1   A3_B2   A4_B3   A5_B4   A6_B5   A7_B6   A8_B7   A9_B8  A10_B9 A11_B10 
##    0.90    0.73    0.73    0.71    0.75    0.72    0.76    0.58    0.64    0.69 
## A14_B12 A15_B13 A16_B14 A17_B15 A18_B16 A20_B17 A21_B18 A22_B19 A23_B20 A24_B21 
##    0.69    0.85    0.75    0.72    0.84    0.54    0.53    0.58    0.62    0.63 
## A26_B22 A27_B23 A28_B24 A29_B25 A31_B27 A32_B28 A33_B29  A0_B11  A0_B26  A0_B30 
##    0.55    0.61    0.63    0.83    0.72    0.76    0.81    0.88    0.72    0.85 
##  A0_B31  A0_B32 
##    0.79    0.75 
## 
## Loadings:
##         Factor1 Factor2 Factor3 Factor4
## A20_B17                  0.56          
## A26_B22          0.31    0.57          
## A1_B1                                  
## A3_B2    0.35    0.34                  
## A4_B3    0.31    0.40                  
## A5_B4    0.41                          
## A6_B5            0.32                  
## A7_B6            0.42                  
## A8_B7    0.37                          
## A9_B8    0.42    0.36            0.32  
## A10_B9   0.46                          
## A11_B10  0.34    0.36                  
## A14_B12  0.40                          
## A15_B13                                
## A16_B14  0.44                          
## A17_B15          0.43                  
## A18_B16                                
## A21_B18  0.44            0.47          
## A22_B19  0.40            0.41          
## A23_B20  0.39                          
## A24_B21          0.47            0.33  
## A27_B23  0.37            0.41          
## A28_B24  0.32                    0.34  
## A29_B25                  0.36          
## A31_B27                          0.33  
## A32_B28  0.33                          
## A33_B29          0.38                  
## A0_B11                                 
## A0_B26           0.39            0.33  
## A0_B30                                 
## A0_B31                           0.33  
## A0_B32                           0.41  
## 
##                Factor1 Factor2 Factor3 Factor4
## SS loadings       2.82    2.54    2.20    1.57
## Proportion Var    0.09    0.08    0.07    0.05
## Cumulative Var    0.09    0.17    0.24    0.29
## 
## Test of the hypothesis that 4 factors are sufficient.
## The chi square statistic is 1203.85 on 374 degrees of freedom.
## The p-value is 7.2e-88
load4 <- tidy(fitfact4)

ggplot(load4, aes(x = fl1, y = fl2)) + 
  geom_point() + 
  geom_label_repel(aes(label = paste0("A", rownames(load))), show.legend = FALSE) +
  labs(x = "Factor 1", y = "Factor 2",
       title = "Standardised Loadings", 
       subtitle = "Based upon correlation matrix") +
  theme_minimal()

main_factors <- load4 %>% 
#  mutate(factorNone = 0.4) %>%  # add this to set the main factor to "None" where all loadings are below 0.4
  pivot_longer(names_to = "factor",
               cols = contains("fl")) %>% 
  mutate(value_abs = abs(value)) %>% 
  group_by(variable) %>% 
  top_n(1, value_abs) %>% 
  ungroup() %>% 
  transmute(main_factor = factor, variable)

library(kableExtra)
load4 %>% 
  select(-uniqueness) %>% 
  # add the info about which is the main factor
  left_join(main_factors) %>%
  left_join(test_versions %>% select(variable = label, description)) %>% 
  arrange(main_factor) %>% 
  select(main_factor, everything()) %>% 
  # arrange adjectives by descending loading on main factor
  rowwise() %>% 
  mutate(max_loading = max(abs(c_across(starts_with("fl"))))) %>% 
  group_by(main_factor) %>% 
  arrange(-max_loading, .by_group = TRUE) %>% 
  select(-max_loading) %>% 
  # sort out the presentation
  rename("Main Factor" = main_factor, # the _ throws a latex error
         "Question" = variable) %>%
  mutate_at(
    vars(starts_with("fl")),
    ~ cell_spec(round(., digits = 3), bold = if_else(abs(.) > 0.4, T, F))
  ) %>% 
  kable(booktabs = T, escape = F, longtable = T) %>% 
  kableExtra::collapse_rows(columns = 1, valign = "top") %>%
  kableExtra::kable_styling(latex_options = c("repeat_header"))
## Joining, by = "variable"
## Joining, by = "variable"
Main Factor Question fl1 fl2 fl3 fl4 description
fl1 A10_B9 0.456 0.276 0.229 0.157 trig.identity
A16_B14 0.44 0.03 0.131 0.206 diff.quotient
A9_B8 0.416 0.361 0.128 0.324 trig.ineq.
A5_B4 0.412 0.264 0.195 0.126 logs
A14_B12 0.403 0.225 0.277 0.13 limit
A23_B20 0.393 0.268 0.287 0.274 slope tangent
A8_B7 0.374 0.224 0.16 0.145 sine pi/3
A3_B2 0.346 0.34 0.175 0.022 arithmetic rules
A32_B28 0.326 0.082 0.208 0.288 FtoC algebra
A18_B16 0.269 0.098 0.193 0.2 product rule
fl2 A24_B21 0.137 0.47 0.136 0.334 IVT
A17_B15 0.152 0.43 0.139 0.23 graph f’
A7_B6 0.213 0.422 0.2 0.13 graph translation
A4_B3 0.309 0.4 0.057 0.085 Easy ineq.
A0_B26 0.115 0.388 0.086 0.333 FtoC graph
A33_B29 0.085 0.383 0.138 0.143 difference vectors
A11_B10 0.339 0.362 0.194 0.175 period
A6_B5 0.267 0.318 0.264 0.067 logs
A1_B1 0.025 0.28 0.143 -0.011 linear function
A15_B13 0.202 0.256 0.049 0.201 series
A0_B11 0.175 0.238 0.082 0.17 rational exponents
fl3 A26_B22 0.122 0.308 0.568 0.115 int(poly)
A20_B17 0.276 0.261 0.556 0.063 (exp)’
A21_B18 0.438 0.185 0.473 0.145 (ln(sin))’
A22_B19 0.402 0.197 0.414 0.22 hyp.functions
A27_B23 0.368 0.162 0.407 0.251 int(exp)
A29_B25 0.143 0.052 0.356 0.147 int even funct
A0_B30 0.093 0.135 0.26 0.232 normal vector
fl4 A0_B32 0.229 0.108 0.125 0.41 scalar product
A28_B24 0.322 0.287 0.26 0.343 Int = 0
A0_B31 0.112 0.178 0.246 0.328 vector product
A31_B27 0.293 0.283 0.097 0.326 int(abs)

TODO add comments from Meike here

The first factor is again calculations, but this time only in calculus (i.e. integrals and derivatives).

The second factor seems to be something like “abstract stuff”, it has to do with limits, rules for logarithms etc.

I guess that could be a category of its own.

The third one is interesting. It’s clearly graphical interpretations. All in different settings, but clearly belonging together.

And of the fourth factor I cannot make sense

Packages

In this analysis we used the following packages. You can learn more about each one by clicking on the links below.

  • mirt: For IRT analysis
  • psych: For factor analysis
  • tidyverse: For data wrangling and visualisation
  • reshape: For reshaping nested lists
  • vroom: For reading in many files at once
  • broom: For tidying model output
  • fs: For file system operations
  • gt: For formatting tables
  • knitr: For markdown tables
  • ggrepel: For labelling points without overlap
  • skimr: For data frame level summary
  • ggridges: For ridge plots